Weighted minimizer sampling improves long read mapping
نویسندگان
چکیده
منابع مشابه
Incorporating sequence quality data into alignment improves DNA read mapping
New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read map...
متن کاملImproved Indirect Photon Mapping with Weighted Importance Sampling
This paper offers a novel approach to the indirect photon mapping method. The placement of photons acting as virtual light sources is regarded as a cheap sampling scheme, allowing for the reuse of a complete shooting path at the cost a single shadow ray. In order to counter for its shortcomings, the variance reduction technique called weighted importance sampling is applied. This allows for the...
متن کاملAccurate Long Read Mapping using Enhanced Suffix Arrays
With the rise of high throughput sequencing, new programs have been developed for dealing with the alignment of a huge amount of short read data to reference genomes. Recent developments in sequencing technology allow longer reads, but the mappers for short reads are not suited for reads of several hundreds of base pairs. We propose an algorithm for mapping longer reads, which is based on chain...
متن کاملComparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches
Bioinformatics applications and pipelines increasingly use k-mer indexes to search for similar sequences. The major problem with k-mer indexes is that they require lots of memory. Sampling is often used to reduce index size and query time. Most applications use one of two major types of sampling: fixed sampling and minimizer sampling. It is well known that fixed sampling will produce a smaller ...
متن کاملConsistent Weighted Sampling
We describe an efficient procedure for sampling representatives from a weighted set such that for any weightings S and T , the probability that the two choose the same sample is equal to the Jaccard similarity between them: Pr[sample(S) = sample(T )] = ∑ x min(S(x), T (x)) ∑ x max(S(x), T (x)) where sample(S) is a pair (x, y) with 0 < y ≤ S(x). The sampling process takes expected computation li...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2020
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/btaa435